AITopics | intent recognition

Collaborating Authors

intent recognition

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Transformer-Driven Triple Fusion Framework for Enhanced Multimodal Author Intent Classification in Low-Resource Bangla

Islam, Ariful, Mahmud, Tanvir, Hossen, Md Rifat

arXiv.org Artificial IntelligenceDec-1-2025

The expansion of the Internet and social networks has led to an explosion of user-generated content. Author intent understanding plays a crucial role in interpreting social media content. This paper addresses author intent classification in Bangla social media posts by leveraging both textual and visual data. Recognizing limitations in previous unimodal approaches, we systematically benchmark transformer-based language models (mBERT, DistilBERT, XLM-RoBERTa) and vision architectures (ViT, Swin, SwiftFormer, ResNet, DenseNet, MobileNet), utilizing the Uddessho dataset of 3,048 posts spanning six practical intent categories. We introduce a novel intermediate fusion strategy that significantly outperforms early and late fusion on this task. Experimental results show that intermediate fusion, particularly with mBERT and Swin Transformer, achieves 84.11% macro-F1 score, establishing a new state-of-the-art with an 8.4 percentage-point improvement over prior Bangla multimodal approaches. Our analysis demonstrates that integrating visual context substantially enhances intent classification. Cross-modal feature integration at intermediate levels provides optimal balance between modality-specific representation and cross-modal learning. This research establishes new benchmarks and methodological standards for Bangla and other low-resource languages. We call our proposed framework BangACMM (Bangla Author Content MultiModal).

classification, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2511.23287

Genre: Research Report (0.84)

Industry: Information Technology > Services (0.34)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

DialogGraph-LLM: Graph-Informed LLMs for End-to-End Audio Dialogue Intent Recognition

Liu, HongYu, Li, Junxin, Guo, Changxi, Chen, Hao, Huang, Yaqian, Guo, Yifu, Yang, Huan, Cai, Lihua

arXiv.org Artificial IntelligenceNov-18-2025

Recognizing speaker intent in long audio dialogues among speakers has a wide range of applications, but is a non-trivial AI task due to complex inter-dependencies in speaker utterances and scarce annotated data. To address these challenges, an end-to-end framework, namely DialogGraph-LLM, is proposed in the current work. DialogGraph-LLM combines a novel Multi-Relational Dialogue Attention Network (MR-DAN) architecture with multimodal foundation models (e.g., Qwen2.5-Omni-7B) for direct acoustic-to-intent inference. An adaptive semi-supervised learning strategy is designed using LLM with a confidence-aware pseudo-label generation mechanism based on dual-threshold filtering using both global and class confidences, and an entropy-based sample selection process that prioritizes high-information unlabeled instances. Extensive evaluations on the proprietary MarketCalls corpus and the publicly available MIntRec 2.0 benchmark demonstrate DialogGraph-LLM's superiority over strong audio and text-driven baselines. The framework demonstrates strong performance and efficiency in intent recognition in real world scenario audio dialogues, proving its practical value for audio-rich domains with limited supervision.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.3233/FAIA251182

2511.11

Country: Asia > China (0.28)

Genre: Research Report (0.51)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Conversational Intent-Driven GraphRAG: Enhancing Multi-Turn Dialogue Systems through Adaptive Dual-Retrieval of Flow Patterns and Context Semantics

Zhu, Ziqi, Hu, Tao, Zhang, Honglong, Yang, Dan, Chen, HanGeng, Zhang, Mengran, Chen, Xilun

arXiv.org Artificial IntelligenceNov-13-2025

We present CID-GraphRAG (Conversational Intent-Driven Graph Retrieval Augmented Generation), a novel framework that addresses the limitations of existing dialogue systems in maintaining both contextual coherence and goal-oriented progression in multi-turn customer service conversations. Unlike traditional RAG systems that rely solely on semantic similarity (Conversation RAG) or standard knowledge graphs (GraphRAG), CID-GraphRAG constructs dynamic intent transition graphs from goal achieved historical dialogues and implements a dual-retrieval mechanism that adaptively balances intent-based graph traversal with semantic search. This approach enables the system to simultaneously leverage both conversional intent flow patterns and contextual semantics, significantly improving retrieval quality and response quality. In extensive experiments on real-world customer service dialogues, we employ both automatic metrics and LLM-as-judge assessments, demonstrating that CID-GraphRAG significantly outperforms both semantic-based Conversation RAG and intent-based GraphRAG baselines across all evaluation criteria. Quantitatively, CID-GraphRAG demonstrates substantial improvements over Conversation RAG across automatic metrics, with relative gains of 11% in BLEU, 5% in ROUGE-L, 6% in METEOR, and most notably, a 58% improvement in response quality according to LLM-as-judge evaluations. These results demonstrate that the integration of intent transition structures with semantic retrieval creates a synergistic effect that neither approach achieves independently, establishing CID-GraphRAG as an effective framework for addressing the challenges of maintaining contextual coherence and goal-oriented progression in knowledge-intensive multi-turn dialogues.

cid-graphrag, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2506.19385

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

MVCL-DAF++: Enhancing Multimodal Intent Recognition via Prototype-Aware Contrastive Alignment and Coarse-to-Fine Dynamic Attention Fusion

Huang, Haofeng, Han, Yifei, Zhang, Long, Li, Bin, He, Yangfan

arXiv.org Artificial IntelligenceSep-24-2025

ABSTRACT Multimodal intent recognition (MMIR) suffers from weak semantic grounding and poor robustness under noisy or rare-class conditions. We propose MVCL-DAF++, which extends MVCL-DAF with two key modules: (1) Prototype-aware contrastive alignment, aligning instances to class-level prototypes to enhance semantic consistency; and (2) Coarse-to-fine attention fusion, integrating global modality summaries with token-level features for hierarchical cross-modal interaction. These results demonstrate the effectiveness of prototype-guided learning and coarse-to-fine fusion for robust multimodal understanding. Index T erms-- Multimodal intent recognition, Prototype-aware contrastive alignment, Coarse-to-fine dynamic attention fusion 1. INTRODUCTION Multimodal intent recognition (MMIR) [1] aims to infer user intentions by integrating heterogeneous signals such as spoken language, facial expressions, and vocal intonations. With the rapid adoption of human-centered AI systems [2], robust and generalizable multimodal understanding has become a cornerstone for building intelligent conversational agents [3, 4].

artificial intelligence, belief revision, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2509.17446

Country: Asia > China (0.14)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Belief Revision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Tactile-Based Human Intent Recognition for Robot Assistive Navigation

Peng, Shaoting, Crowder, Dakarai, Yuan, Wenzhen, Driggs-Campbell, Katherine

arXiv.org Artificial IntelligenceSep-23-2025

Abstract-- Robot assistive navigation (RAN) is critical for enhancing the mobility and independence of the growing population of mobility-impaired individuals. However, existing systems often rely on interfaces that fail to replicate the intuitive and efficient physical communication observed between a person and a human caregiver, limiting their effectiveness. In this paper, we introduce T ac-Nav, a RAN system that leverages a cylindrical tactile skin mounted on a Stretch 3 mobile manipulator to provide a more natural and efficient interface for human navigational intent recognition. T o robustly classify the tactile data, we developed the Cylindrical Kernel Support V ector Machine (CK-SVM), an algorithm that explicitly models the sensor's cylindrical geometry and is consequently robust to the natural rotational shifts present in a user's grasp. Comprehensive experiments were conducted to demonstrate the effectiveness of our classification algorithm and the overall system. Results show that CK-SVM achieved superior classification accuracy on both simulated (97.1%) and real-world (90.8%) datasets compared to four baseline models. Furthermore, a pilot study confirmed that users more preferred the T ac-Nav tactile interface over conventional joystick and voice-based controls. I. INTRODUCTION Robot assistive navigation (RAN) - the task of a robot providing physical support while people moving from one place to another - is of critical importance for people with mobility impairments [1]. In the U.S., 12.2% of adults live with a mobility disability, and the aging population suggests an increasing need for navigation assistance [2], [3].

artificial intelligence, belief revision, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2509.16353

Country: North America > United States > Illinois (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Belief Revision (0.62)
(2 more...)

Add feedback

Evaluating Multi-Turn Bargain Skills in LLM-Based Seller Agent

Wang, Issue Yishu, Chong, Kakam, Wang, Xiaofeng, Yan, Xu, Kong, DeXin, Ju, Chen, Chen, Ming, Xiao, Shuai, Han, Shuguang, chen, jufeng

arXiv.org Artificial IntelligenceSep-9-2025

In online second-hand marketplaces, multi-turn bargaining is a crucial part of seller-buyer interactions. Large Language Models (LLMs) can act as seller agents, negotiating with buyers on behalf of sellers under given business constraints. A critical ability for such agents is to track and accurately interpret cumulative buyer intents across long negotiations, which directly impacts bargaining effectiveness. We introduce a multi-turn evaluation framework for measuring the bargaining ability of seller agents in e-commerce dialogues. The framework tests whether an agent can extract and track buyer intents. Our contributions are: (1) a large-scale e-commerce bargaining benchmark spanning 622 categories, 9,892 products, and 3,014 tasks; (2) a turn-level evaluation framework grounded in Theory of Mind (ToM) with annotated buyer intents, moving beyond outcome-only metrics; and (3) an automated pipeline that extracts reliable intent from massive dialogue data.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2509.06341

Country:

Europe > Belgium (0.15)
North America > United States (0.14)
Europe > Denmark (0.14)

Genre: Research Report (0.43)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Enhancing Low-Altitude Airspace Security: MLLM-Enabled UAV Intent Recognition

Lei, Guangyu, Liang, Tianhao, Ping, Yuqi, Chen, Xinglin, Zhou, Longyu, Wu, Junwei, Zhang, Xiyuan, Ding, Huahao, Zhang, Xingjian, Yuan, Weijie, Zhang, Tingting, Zhang, Qinyu

arXiv.org Artificial IntelligenceSep-9-2025

The rapid development of the low-altitude economy emphasizes the critical need for effective perception and intent recognition of non-cooperative unmanned aerial vehicles (UAVs). The advanced generative reasoning capabilities of multimodal large language models (MLLMs) present a promising approach in such tasks. In this paper, we focus on the combination of UAV intent recognition and the MLLMs. Specifically, we first present an MLLM-enabled UAV intent recognition architecture, where the multimodal perception system is utilized to obtain real-time payload and motion information of UAVs, generating structured input information, and MLLM outputs intent recognition results by incorporating environmental information, prior knowledge, and tactical preferences. Subsequently, we review the related work and demonstrate their progress within the proposed architecture. Then, a use case for low-altitude confrontation is conducted to demonstrate the feasibility of our architecture and offer valuable insights for practical system design. Finally, the future challenges are discussed, followed by corresponding strategic recommendations for further applications.

artificial intelligence, belief revision, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2509.06312

Country: Asia > China (0.29)

Genre: Research Report (0.84)

Industry:

Information Technology (0.88)
Government > Military (0.46)
Aerospace & Defense > Aircraft (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Belief Revision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Do Large Language Models Need Intent? Revisiting Response Generation Strategies for Service Assistant

Bolshinsky, Inbal, Kupiec, Shani, Sasson, Almog, Aperstein, Yehudit, Apartsin, Alexander

arXiv.org Artificial IntelligenceSep-8-2025

In the era of conversational AI, generating accurate and contextually appropriate service responses remains a critical challenge. A central question remains: Is explicit intent recognition a prerequisite for generating high-quality service responses, or can models bypass this step and produce effective replies directly? This paper conducts a rigorous comparative study to address this fundamental design dilemma. Leveraging two publicly available service interaction datasets, we benchmark several state-of-the-art language models, including a fine-tuned T5 variant, across both paradigms: Intent-First Response Generation and Direct Response Generation. Evaluation metrics encompass both linguistic quality and task success rates, revealing surprising insights into the necessity or redundancy of explicit intent modelling. Our findings challenge conventional assumptions in conversational AI pipelines, offering actionable guidelines for designing more efficient and effective response generation systems.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2509.05006

Country: Asia > Middle East > UAE (0.28)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback

LLM-Guided Semantic Relational Reasoning for Multimodal Intent Recognition

Zhou, Qianrui, Xu, Hua, Wang, Yifan, Dong, Xinzhi, Zhang, Hanlei

arXiv.org Artificial IntelligenceSep-3-2025

Understanding human intents from multimodal signals is critical for analyzing human behaviors and enhancing human-machine interactions in real-world scenarios. However, existing methods exhibit limitations in their modality-level reliance, constraining relational reasoning over fine-grained semantics for complex intent understanding. This paper proposes a novel LLM-Guided Semantic Relational Reasoning (LGSRR) method, which harnesses the expansive knowledge of large language models (LLMs) to establish semantic foundations that boost smaller models' relational reasoning performance. Specifically, an LLM-based strategy is proposed to extract fine-grained semantics as guidance for subsequent reasoning, driven by a shallow-to-deep Chain-of-Thought (CoT) that autonomously uncovers, describes, and ranks semantic cues by their importance without relying on manually defined priors. Besides, we formally model three fundamental types of semantic relations grounded in logical principles and analyze their nuanced interplay to enable more effective relational reasoning. Extensive experiments on multimodal intent and dialogue act recognition tasks demonstrate LGSRR's superiority over state-of-the-art methods, with consistent performance gains across diverse semantic understanding scenarios. The complete data and code are available at https://github.com/thuiar/LGSRR.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2509.01337

Country:

Asia (0.68)
North America > United States > Minnesota (0.28)

Genre: Research Report > Promising Solution (0.34)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Multilingual Datasets for Custom Input Extraction and Explanation Requests Parsing in Conversational XAI Systems

Wang, Qianli, Anikina, Tatiana, Feldhus, Nils, Ostermann, Simon, Splitt, Fedor, Li, Jiaao, Tsoneva, Yoana, Möller, Sebastian, Schmitt, Vera

arXiv.org Artificial IntelligenceAug-22-2025

Conversational explainable artificial intelligence (ConvXAI) systems based on large language models (LLMs) have garnered considerable attention for their ability to enhance user comprehension through dialogue-based explanations. Current ConvXAI systems often are based on intent recognition to accurately identify the user's desired intention and map it to an explainability method. While such methods offer great precision and reliability in discerning users' underlying intentions for English, a significant challenge in the scarcity of training data persists, which impedes multilingual generalization. Besides, the support for free-form custom inputs, which are user-defined data distinct from pre-configured dataset instances, remains largely limited. To bridge these gaps, we first introduce MultiCoXQL, a multilingual extension of the CoXQL dataset spanning five typologically diverse languages, including one low-resource language. Subsequently, we propose a new parsing approach aimed at enhancing multilingual parsing performance, and evaluate three LLMs on MultiCoXQL using various parsing strategies. Furthermore, we present Compass, a new multilingual dataset designed for custom input extraction in ConvXAI systems, encompassing 11 intents across the same five languages as MultiCoXQL. We conduct monolingual, cross-lingual, and multilingual evaluations on Compass, employing three LLMs of varying sizes alongside BERT-type models.

computational linguistic, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2508.14982

Country:

Europe (1.00)
Asia > Middle East > UAE (0.46)
North America > United States > Minnesota (0.28)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback